基于1-HOP邻居之间的消息传递(MP)范式交换信息的图形神经网络(GNN),以在每一层构建节点表示。原则上,此类网络无法捕获在图形上学习给定任务的可能或必需的远程交互(LRI)。最近,人们对基于变压器的图的开发产生了越来越多的兴趣,这些方法可以考虑超出原始稀疏结构以外的完整节点连接,从而实现了LRI的建模。但是,仅依靠1跳消息传递的MP-gnn与位置特征表示形式结合使用时通常在几个现有的图形基准中表现得更好,因此,限制了Transferter类似体系结构的感知效用和排名。在这里,我们介绍了5个图形学习数据集的远程图基准(LRGB):Pascalvoc-SP,Coco-SP,PCQM-Contact,Peptides-Func和肽结构,可以说需要LRI推理以在给定的任务中实现强大的性能。我们基准测试基线GNN和Graph Transformer网络,以验证捕获长期依赖性的模型在这些任务上的性能明显更好。因此,这些数据集适用于旨在捕获LRI的MP-GNN和Graph Transformer架构的基准测试和探索。
translated by 谷歌翻译
我们提出了一个食谱,讲述了如何建立具有线性复杂性和最先进的结果的一般,功能可扩展的(GPS)图形变压器,并在各种基准测试基准上。 Graph Transformers(GTS)在图形表示学习领域中获得了多种近期出版物的知名度,但它们对构成良好的位置或结构编码的共同基础以及与众不同的区别。在本文中,我们总结了具有更清晰的定义的不同类型的编码,并将其分类为$ \ textit {local} $,$ \ textit {global} $或$ \ textit {fextit {ferseal} $。此外,GTS仍被限制在具有数百个节点的小图上,我们提出了第一个具有复杂性线性的体系结构对节点和边缘$ O(n+e)$的数量,通过将局部实质汇总从完全 - 连接的变压器。我们认为,这种解耦并不会对表现性产生负面影响,而我们的体系结构是图形的通用函数近似器。我们的GPS配方包括选择3种主要成分:(i)位置/结构编码,(ii)局部消息通讯机制和(iii)全局注意机制。我们构建和开源一个模块化框架$ \ textit {graphgps} $,该{GraphGps} $支持多种类型的编码,并且在小图和大图中提供效率和可扩展性。我们在11个基准测试上测试了我们的体系结构,并对所有这些基准显示出非常具竞争力的结果,展示了由模块化和不同策略组合获得的经验益处。
translated by 谷歌翻译
In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of December 2022, the GitHub repository has reached 2,000 stars and 380 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting.
translated by 谷歌翻译
机器学习的应用(ML)日益增加,用于许多独特而具有挑战性的科学应用。但是,这些应用面临的至关重要的挑战是它们需要超长延迟和探索器ML功能。鉴于摩尔定律和丹纳德缩放的放缓,再加上科学仪器的快速进步,导致数据速率不断增长,因此需要在极端边缘的超快速ML。边缘的快速ML对于实时减少和过滤科学数据至关重要,以加速科学实验并实现更深刻的见解。为了加速实时科学边缘ML硬件和软件解决方案,我们需要具有足够规格的受限基准任务,以便通常适用且可访问。这些基准可以指导未来Edge ML硬件的设计,用于能够满足纳秒和微秒级延迟要求的科学应用程序。为此,我们介绍了一组科学的ML基准,涵盖了各种ML和嵌入式系统技术。
translated by 谷歌翻译
我们介绍了一个大规模实验,该实验对编码器进行了预处理,其参数计数范围从700m到9.3b不等,随后蒸馏到较小的型号中,范围为17m-170亿参数,其应用到自然语言理解(NLU)组件(NLU)组件(虚拟助手系统。尽管我们使用70%的口语数据训练,但在对书面形式的跨语性自然语言推论(XNLI)语料库进行评估时,我们的教师模型与XLM-R和MT5相当。我们使用系统中的内域数据对教师模型进行了第二阶段的训练,以提高了3.86%的相对分类,而相对7.01%的插槽填充。我们发现,即使是从我们的2阶段教师模型中提取的170亿参数模型,与仅接受公共数据的2.3B参数老师相比,与2.3B参数老师相比,意图分类更好2.88%,并且7.69%的插槽填充错误率更好(第1阶段),强调了。内域数据对训练的重要性。当使用标记的NLU数据进行离线评估时,我们的17m参数阶段2蒸馏模型的表现分别优于XLM-R碱基(85m Params)和Distillbert(42m Params),分别优于4.23%至6.14%。最后,我们介绍了一个完整的虚拟助手实验平台的结果,在该平台中,我们发现使用经过预训练和蒸馏管道训练的模型超过了从8500万参数教师蒸馏的模型,在自动测量全系统用户不满的自动测量中,从8500万参数教师蒸馏出3.74%-4.91%。
translated by 谷歌翻译
机器学习传感器代表了嵌入式机器学习应用程序未来的范式转移。当前的嵌入式机器学习(ML)实例化遭受了复杂的整合,缺乏模块化以及数据流动的隐私和安全问题。本文提出了一个以数据为中心的范式,用于将传感器智能嵌入边缘设备上,以应对这些挑战。我们对“传感器2.0”的愿景需要将传感器输入数据和ML处理从硬件级别隔离到更广泛的系统,并提供一个薄的界面,以模拟传统传感器的功能。这种分离导致模块化且易于使用的ML传感器设备。我们讨论了将ML处理构建到嵌入式系统上控制微处理器的软件堆栈中的标准方法所带来的挑战,以及ML传感器的模块化如何减轻这些问题。 ML传感器提高了隐私和准确性,同时使系统构建者更容易将ML集成到其产品中,以简单的组件。我们提供了预期的ML传感器和说明性数据表的例子,以表现出来,并希望这将建立对话使我们朝着传感器2.0迈进。
translated by 谷歌翻译
我们展示了CFU Playground,这是一个全堆栈的开源框架,可实现用于嵌入式ML系统的机器学习(ML)加速器的快速和迭代设计。我们的工具链紧紧集成开源软件,RTL发电机和FPGA工具,用于综合,地点和路线。此全堆栈开发框架为工程师提供了访问探索定制架构,这些架构是为嵌入式ML定制和共同优化的。快速,部署型材优化反馈循环让ML硬件和软件开发人员在对定制方面相对较小的投资中取得重大回报。使用CFU Playground的设计循环,我们在CPU和加速器之间显示了大量的Speedups(55x-75x)和设计空间探索。
translated by 谷歌翻译
Reliable and cost-effective counting of people in large indoor spaces is a significant challenge with many applications. An emerging approach is to deploy multiple fisheye cameras mounted overhead to monitor the whole space. However, due to the overlapping fields of view, person re-identificaiton (PRID) is critical for the accuracy of counting. While PRID has been thoroughly researched for traditional rectilinear cameras, few methods have been proposed for fisheye cameras and their performance is comparatively lower. To close this performance gap, we propose a multi-feature framework for fisheye PRID where we combine deep-learning, color-based and location-based features by means of novel feature fusion. We evaluate the performance of our framework for various feature combinations on FRIDA, a public fisheye PRID dataset. The results demonstrate that our multi-feature approach outperforms recent appearance-based deep-learning methods by almost 18% points and location-based methods by almost 3% points in accuracy.
translated by 谷歌翻译
We consider a long-term average profit maximizing admission control problem in an M/M/1 queuing system with a known arrival rate but an unknown service rate. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue-length of the system. \cite[Econometrica]{Naor} showed that if all the parameters of the model are known, then it is optimal to use a static threshold policy - admit if the queue-length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full information model of \cite{Naor}. We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln^{3+\epsilon}(N))$ regret in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of arrivals and $\epsilon>0$.
translated by 谷歌翻译
With the growing global deployment of carbon capture and sequestration technology to combat climate change, monitoring and detection of potential CO2 leakage through existing or storage induced faults are critical to the safe and long-term viability of the technology. Recent work on time-lapse seismic monitoring of CO2 storage has shown promising results in its ability to monitor the growth of the CO2 plume from surface recorded seismic data. However, due to the low sensitivity of seismic imaging to CO2 concentration, additional developments are required to efficiently interpret the seismic images for leakage. In this work, we introduce a binary classification of time-lapse seismic images to delineate CO2 plumes (leakage) using state-of-the-art deep learning models. Additionally, we localize the leakage region of CO2 plumes by leveraging Class Activation Mapping methods.
translated by 谷歌翻译